add support for schema evolution #329

redaLaanait · 2023-11-13T18:51:41Z

The main idea is to generate a composite schema by combining both read and write schemas:

The composite schema essentially contains an optional field-level attribute field.action, which can take on one of two values: DRAIN or SET_DEFAULT.
field.action is considered during the record decoding process, where it is used to skip reading the encoded value or force setting the default value.
To set the default value, I added a createDefaultDecoder function that deals with different schema types and ensures the conversion of the default value.
Also, I made a change to SchemaCompatibility.Compatible function to consider field aliases.

fixes #20

…_resolution

nrwiersma

Thanks for taking a stab at this. A few comments from a cursory review.

codec_default.go

codec_record.go

schema_compatibility.go

redaLaanait · 2023-11-13T23:39:43Z

Note: I can keep working on this task unless you want to take charge.

redaLaanait · 2023-11-17T18:55:45Z

I tried the following approach to handle type's promotion:

added actual Type field to PrimitiveSchema
Introduced codecPromoter (used by some native decoders) and readerPromoter (a reader wrapper used by *Reader.readNext method)

The draft uses reflect.ValueOf on a few occasions... I wonder if this goes against the initial idea of using the reflect2 package.

nrwiersma · 2023-11-17T19:03:41Z

Its an interesting approach. Will look at it in more detail this weekend. First impression is that it is quiet verbose, but I dont have a better idea at this point, so this is already great work.

redaLaanait · 2023-11-17T19:12:16Z

Thanks for the quick feedback!

nrwiersma · 2023-11-18T07:08:44Z

So i had some time to take a look at the last commits. First off, very nice progress here. It is not as verbose as I had previously thought. One thing I have noticed is that everything happens at runtime, which will carry a heavy performance penalty. However, we have all the information needed to move this from a runtime problem to a planning problem.

My suggestion is to change the codecs to use a specific "conversion" function, for example:

type longCodec[T largeInt] struct {
	convert func(*Reader) int64
}

func (c *longCodec[T]) Decode(ptr unsafe.Pointer, r *Reader) {
	*((*T)(ptr)) = T(c.convert(r))
}

This both cleans up the functions a little, and means we know up front what this function looks like. It will still take a performance hit from calling the indirect function, but I suspect this should be quiet small, but will need to be benchmarked. For string <-> byte conversion, there are unsafe ways to make this alloc-less.

The converter creators could look something like:

func createLongConverter(typ Type) (func(*Reader) int64, error) {
	switch typ {
	case Int:
		return func(r *Reader) int64 { return int64(r.ReadInt()) }, nil
	case Long:
		return func(r *Reader) int64 { return r.ReadLong() }, nil
	default:
		return nil, fmt.Errorf("cannot promote from %q to %q", typ, Long)
	}
}

Leaving the planning to look something like this:

case reflect.Uint32:
    if schema.Type() != Long {
	    break
    }
    convert, err := createLongConverter(actual)
    if err != nil {
	    return &errorDecoder{err: fmt.Errorf("avro: %w", err)}
    }
    return &longCodec[uint32]{convert: convert}

Thoughts?

redaLaanait · 2023-11-18T11:58:00Z

Your solution looks cleaner and doesn’t use reflect.ValueOf, so as you said, It may have better performance.

Regarding the actual field, it might be empty. In this case, we might need to make the convert func(*Reader) int64 function optional and handle its nil value in the decode function call. (I'm not sure if this is the right approach from a performance optimization point of view).

Another point (a bit opinionated) I was thinking about is that codec natives decoders and Reader.readNext seem not orthogonal; they implement similar logic/knowledge, which leads to duplicating the new type promotion logic on both sides. IMO, it would be nice to find a way to reuse codec natives decoders and, somehow, replace/refactor Reader.readNext.

nrwiersma · 2023-11-18T12:15:31Z

Regarding the actual field, it might be empty.

Fair point. Either the nil value can be handled, or the empty actual can be handled as a non-conversion. Both work.

IMO, it would be nice to find a way to reuse codec natives decoders and, somehow, replace/refactor Reader.readNext.

I tend to disagree. The Reader is responsible for the binary decoding of the Avro data, where the codecs are responsible for decoding the Schema, as it were, into Go types. This separation makes each part much simpler and reusable. As promotion lives in a Schema level rather than a binary level, the changes should happen at the codec level. The codec however has 2 parts: 1) reading the correct binary data from the Reader, and 2) converting to the correct Go type, and this is what we see in the native promotion code.

redaLaanait · 2023-11-18T12:26:31Z

My point is specific to readNext method of the Reader. I think it does something related to:

converting to the correct Go type

nrwiersma · 2023-11-18T12:31:45Z

Ah, I misunderstood. You are correct here. Generic reading should be in the codec level. It would be more consistent.

redaLaanait · 2023-11-18T14:11:06Z

I’ll update the draft first to consider your proposed approach, then figure out how to handle the readNext case.

redaLaanait · 2023-11-25T20:47:52Z

I integrated the "convert function idea" in my draft and ended up moving away from "moving readNext logic to codec-level" to keep the focus on schema evolution.

If the type promotion handling design looks good, then the remaining central point to fix should be handling default value at the decoder-level:

I made some cleanups and slightly reduced verbosity by considering that the validateDefault function does normalize the default value type.

Any thoughts or suggestions for improvement or a better solution are welcome!

redaLaanait · 2023-11-30T17:04:45Z

To avoid duplicating the type's promotion and default value logic in case of dynamic decoder, I tried to replace Reader.ReadNext by the following:

Replace:

pObj := (*any)(ptr)
obj := *pObj
*pObj = r.ReadNext(d.schema)

With:

pObj := (*any)(ptr)
obj := *pObj
receiverPtr, receiverTyp, err := dynamicReceiver(d.schema, r.cfg.resolver)
if err != nil {
	r.ReportError("Read", err.Error())
	return
}
decoderOfType(r.cfg, d.schema, receiverTyp).Decode(receiverPtr, r)
*pObj = receiverTyp.UnsafeIndirect(receiverPtr)

Benchmark:

ReadNext

BenchmarkDecoderInterface/Empty_Interface-8                41175             29783 ns/op           16453 B/op        253 allocs/op
BenchmarkDecoderInterface/Interface_Non-Ptr-8              41558             29821 ns/op           16453 B/op        253 allocs/op
BenchmarkDecoderInterface/Interface_Nil_Ptr-8              41488             28455 ns/op           16101 B/op        250 allocs/op
BenchmarkDecoderInterface/Interface_Ptr-8                  41211             28406 ns/op           16101 B/op        250 allocs/op

Dynamic Receiver

BenchmarkDecoderInterface/Empty_Interface-8                34342             33980 ns/op           17259 B/op        287 allocs/op
BenchmarkDecoderInterface/Interface_Non-Ptr-8              34971             33879 ns/op           17259 B/op        287 allocs/op
BenchmarkDecoderInterface/Interface_Nil_Ptr-8              41997             27843 ns/op           16101 B/op        250 allocs/op
BenchmarkDecoderInterface/Interface_Ptr-8                  42182             27798 ns/op           16101 B/op        250 allocs/op

schema.go

… default

…_resolution

redaLaanait · 2023-12-21T18:25:56Z

The draft looks good to me and can be turned into PR.

One point I didn't receive your feedback about is the replacement of readNext at the codec level.
It's implemented in codec_generic.go and runs against the same test suite as reader_generic.go

nrwiersma · 2023-12-21T18:28:35Z

One point I didn't receive your feedback about is the replacement of readNext at the codec level. It's implemented in codec_generic.go and runs against the same test suite as reader_generic.go

I see the need here, because of the compatibility changes. I would prefer it in a separate PR though, just to reduce the change scope a little, and make it simpler to review.

…_resolution

nrwiersma

First off, great work 🎉 Sorry for the delay, I have been away.
I am doing this review in stages as I have time, as it is quiet large.
This is the first round.

schema.go

resolver.go

redaLaanait · 2024-01-09T13:04:52Z

Take your time to review it, and be merciless 🙂. It’s a large PR where errors could slip through.

schema.go

Co-authored-by: Nicholas Wiersma <[email protected]>

converter.go

codec_default.go

nrwiersma

LGTM 🎉

redaLaanait · 2024-01-12T10:27:06Z

I think this is an issue. The reader will not reset on the second read,

I misused borrowReader, but It was strange that my tests didn't catch this issue.

That's because the generic decoding bypasses the cache and instantiates a new decoder in each call: efaceDecoder.Decode -> genericDecode -> decoderOfType

I'm not sure If I should do something in this regard.

nrwiersma · 2024-01-12T10:29:57Z

I am not super stressed. There is a lot in the lib that is difficult if not impossible to test.

redaLaanait added 2 commits November 13, 2023 18:45

feat: add support for schema evolution poc

b9d7442

Merge branch 'main' of https://github.com/hamba/avro into feat/schema…

d68f799

…_resolution

redaLaanait mentioned this pull request Nov 13, 2023

Support for Schema Evolution #20

Closed

nrwiersma requested changes Nov 13, 2023

View reviewed changes

codec_default.go Outdated Show resolved Hide resolved

codec_record.go Outdated Show resolved Hide resolved

schema_compatibility.go Show resolved Hide resolved

schema_compatibility.go Outdated Show resolved Hide resolved

schema_compatibility.go Outdated Show resolved Hide resolved

redaLaanait marked this pull request as draft November 14, 2023 23:10

fix: type promotion POC

4ade51d

redaLaanait force-pushed the feat/schema_evolution branch from 10b955d to 4ade51d Compare November 17, 2023 18:58

redaLaanait force-pushed the feat/schema_evolution branch 3 times, most recently from 1877e31 to e89e8c8 Compare November 25, 2023 17:21

fix: Fix and include last remarks

845d478

redaLaanait force-pushed the feat/schema_evolution branch from e89e8c8 to 845d478 Compare November 25, 2023 17:31

reduce default decoder verbosity

908868d

redaLaanait added 2 commits November 30, 2023 17:28

add tests for schema resolution

3f02ab4

attempt replacing readNext by native decoders

2498c29

redaLaanait commented Nov 30, 2023

View reviewed changes

schema.go Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

fix(schema compatibility): support named schema aliases

9d6c104

redaLaanait force-pushed the feat/schema_evolution branch from b8aac9b to 84bdd49 Compare December 11, 2023 19:07

redaLaanait added 4 commits December 15, 2023 15:42

clean up generic decode and improve test coverage

0b74a61

fix decoder cachekey to consider primitives with promotion and fields…

0fc43f2

… default

fix: codec generic

62b27ce

cleanups and fixes

7db00da

redaLaanait force-pushed the feat/schema_evolution branch from c17c6aa to 7db00da Compare December 17, 2023 17:14

redaLaanait added 3 commits December 21, 2023 16:49

Merge branch 'main' of https://github.com/hamba/avro into feat/schema…

04e2ddc

…_resolution

fix: fix resolve record

51224e6

improve record cache fingerprint

ade5d38

redaLaanait marked this pull request as ready for review December 21, 2023 18:26

redaLaanait changed the title ~~add support for schema evolution (POC)~~ add support for schema evolution Dec 21, 2023

Merge branch 'main' of https://github.com/hamba/avro into feat/schema…

5adccb8

…_resolution

nrwiersma requested changes Jan 9, 2024

View reviewed changes

schema.go Outdated Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

resolver.go Outdated Show resolved Hide resolved

fix(default encoder): better handling of nullDefault

e364790

redaLaanait added 2 commits January 9, 2024 14:06

fix: record cache fingerprint

bc0e276

rename FieldDrain by FieldIgnore

2ec1b60

nrwiersma reviewed Jan 10, 2024

View reviewed changes

schema.go Outdated Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

schema.go Outdated Show resolved Hide resolved

redaLaanait and others added 4 commits January 10, 2024 11:53

fix: record cache fingerprint

17926ac

fix: record cache fingerprint

1dc0276

clean up

85f8d7c

Co-authored-by: Nicholas Wiersma <[email protected]>

clean up

b96aefc

nrwiersma requested changes Jan 11, 2024

View reviewed changes

converter.go Outdated Show resolved Hide resolved

codec_default.go Show resolved Hide resolved

codec_default.go Outdated Show resolved Hide resolved

redaLaanait added 2 commits January 12, 2024 10:36

fix: codec default reader/writer usage

f2c19a2

fix: bytes to string converter

35a64d9

nrwiersma approved these changes Jan 12, 2024

View reviewed changes

nrwiersma merged commit 589f785 into hamba:main Jan 12, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for schema evolution #329

add support for schema evolution #329

redaLaanait commented Nov 13, 2023 •

edited

Loading

nrwiersma left a comment

redaLaanait commented Nov 13, 2023

redaLaanait commented Nov 17, 2023

nrwiersma commented Nov 17, 2023

redaLaanait commented Nov 17, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

redaLaanait commented Nov 25, 2023

redaLaanait commented Nov 30, 2023

redaLaanait commented Dec 21, 2023

nrwiersma commented Dec 21, 2023

nrwiersma left a comment

redaLaanait commented Jan 9, 2024

nrwiersma left a comment

redaLaanait commented Jan 12, 2024

nrwiersma commented Jan 12, 2024

add support for schema evolution #329

add support for schema evolution #329

Conversation

redaLaanait commented Nov 13, 2023 • edited Loading

nrwiersma left a comment

Choose a reason for hiding this comment

redaLaanait commented Nov 13, 2023

redaLaanait commented Nov 17, 2023

nrwiersma commented Nov 17, 2023

redaLaanait commented Nov 17, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

nrwiersma commented Nov 18, 2023

redaLaanait commented Nov 18, 2023

redaLaanait commented Nov 25, 2023

redaLaanait commented Nov 30, 2023

redaLaanait commented Dec 21, 2023

nrwiersma commented Dec 21, 2023

nrwiersma left a comment

Choose a reason for hiding this comment

redaLaanait commented Jan 9, 2024

nrwiersma left a comment

Choose a reason for hiding this comment

redaLaanait commented Jan 12, 2024

nrwiersma commented Jan 12, 2024

redaLaanait commented Nov 13, 2023 •

edited

Loading